max rank | avg. rank | sentence |
---|---|---|
295 | 93.3333 | Ol ýurduň iň uly ykdysady, syýasy we medeni merkezi. |
460 | 212.0000 | Ol işläp bermek bilen baglanyşykly bolupdyr. |
554 | 212.3750 | Ol esasy senagat, söwda we ylym merkezi hasaplanýar. |
704 | 288.6000 | Türkmen halkynyň gülläp ösen döwri. |
769 | 261.5455 | Häzirki wagtda Türkmenistan özüni esasy azyk önümleri bilen doly üpjün edýär. |
811 | 213.8182 | Şonuň üçin bu gadymy şäher barada düýpli ylmy maglumatlar juda az. |
815 | 277.4286 | Ol bu babatda dünýäde üçünji ýerde durýar. |
822 | 361.0000 | Olaryň biri «Parfiýa» diýlip atlandyrylýar. |
827 | 313.5714 | Ol Rimiň taryhynda iň uly gozgalaň bolupdyr. |
850 | 416.5000 | Bu ýerleriň medeniýeti umumy görnüşde bolupdyr. |
875 | 458.1667 | Ýagny: uruş yglan etmäge hukuk berilýär. |
931 | 214.7778 | Bu döwürde has medeni taýdan ösen şäher Merw bolupdyr. |
939 | 299.8571 | Ýewropa bilen söwda gatnaşyklary işjeň alyp bardy. |
1000 | 366.6000 | Şonuň bilen birlikde söwda, deňiz ýollary we pul gatnaşyklary ösüpdir. |
1006 | 370.7778 | Bu döwlet b.e. öñ XII asyra çenli dowam edýär. |
1006 | 372.8889 | Onuň has ösen döwri b.e.öň V asyra gabat gelýär. |
1077 | 652.5556 | Italiýa tiz wagtyň içinde demokratik möhüm döwletleriň birine öwrülýär. |
1082 | 378.4444 | Soňra bolsa Günbatar ýurtlaryň üsti bilen geçip Italiýa barýar. |
1085 | 405.6250 | Oguz han türkmenleriň iň gadymy hökümdary hasaplanýar. |
1086 | 470.6000 | Beýik Saparmyrat Türkmenbaşy - Mukaddes Ruhnama. |
1149 | 622.5556 | B.e.öň IV asyryň ortalarynda Rim Italiýada güýçli döwlete öwrülipdir. |
1189 | 430.2000 | Onuň döwründe Rim gaýtadan gurlupdyr. |
1235 | 596.5000 | Türkmenistanyň şäherleri weýran edildi. |
1250 | 613.1667 | Fransiýa, Angliýa, Russiýa bilen gatnaşyklar ösüpdir. |
1255 | 385.6250 | Aziýa halklary dini gatnaşyklary boýunça hem güýçli tapawutlanýar. |
1261 | 410.8333 | Emma onuñ olardan başga köp gyzy bolupdyr we olaryñ atlary belli däldir. |
1286 | 601.4000 | Şeýdip jemgyýetçilik hereket ýokary galdy. |
1337 | 528.2000 | Olar jemgyýetiň ösmegine ýardam edipdirler. |
1338 | 649.0000 | Olaryň hemmesi Türkmenistanyň Gyzyl kitabyna girizildi. |
1377 | 586.5000 | Ol eser Ýewropanyň, Afrikanyň we Aziýanyň käbir halklary barada gyzykly maglumatlar berýär. |
The maximum word rank of a sentence is by definition the rank of the rarest word in the sentence. If it is low, all words in the sentence are of high frequency. For this reason the table of the sentences with least maximum word number might be of interest. In the table, we see the corresponding sentences with a minimum length of 40 characters.
The over all distribution of the maximum rank in all sentences of the corpus is shown in a diagram with log-scaled x-axis.
The sentences in the table described above are of interest because they are usually easy to understand. The distribution may give insights into the corpus and may give parameters for language comparison.
While the distribution might be deduced from a small corpus, the sentences in the table are rare and a large corpus will give more impressive results.
Table data:
select max(w_id)-100 as m, avg(w_id)-100 as a, s.sentence from sentences s, inv_w i where s.s_id=i.s_id and length(sentence)>40 and i.w_id>100 group by s.s_id order by m limit 30;
Distribution data;
select m, count(*) from (select 100* round((max(w_id)-100)/100) as m from sentences s, inv_w i where s.s_id=i.s_id and i.w_id>100 group by s.s_id) aa group by m;
Explain the distribution, especially the increase in its right part.
4.5.2.2 Average word rank in sentence
4.5.2.3 Sentences consisting of many low frequency words I
4.5.2.4 Sentences consisting of many low frequency words II
4.5.2.5 Sentences consisting of short words only I
4.5.2.6 Sentences consisting of short words only II
4.5.2.7 Sentences consisting of long words only I
4.5.2.8 Sentences consisting of long words only II